Dataset statistics
| Number of variables | 12 |
|---|---|
| Number of observations | 17703 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.6 MiB |
| Average record size in memory | 96.0 B |
Variable types
| Numeric | 11 |
|---|---|
| DateTime | 1 |
gross_revenue is highly correlated with invoice_no | High correlation |
invoice_no is highly correlated with gross_revenue and 1 other fields | High correlation |
avg_ticket is highly correlated with qtde_returns and 1 other fields | High correlation |
frequency is highly correlated with invoice_no | High correlation |
qtde_returns is highly correlated with avg_ticket and 1 other fields | High correlation |
avg_basket_size is highly correlated with avg_ticket and 1 other fields | High correlation |
gross_revenue is highly correlated with recency_days and 5 other fields | High correlation |
recency_days is highly correlated with gross_revenue and 1 other fields | High correlation |
invoice_no is highly correlated with gross_revenue and 4 other fields | High correlation |
avg_ticket is highly correlated with avg_unique_basket_size | High correlation |
avg_recency_days is highly correlated with gross_revenue and 3 other fields | High correlation |
frequency is highly correlated with gross_revenue and 3 other fields | High correlation |
qtde_returns is highly correlated with gross_revenue and 3 other fields | High correlation |
avg_basket_size is highly correlated with gross_revenue | High correlation |
avg_unique_basket_size is highly correlated with avg_ticket | High correlation |
gross_revenue is highly correlated with invoice_no | High correlation |
invoice_no is highly correlated with gross_revenue and 2 other fields | High correlation |
avg_recency_days is highly correlated with invoice_no and 1 other fields | High correlation |
frequency is highly correlated with invoice_no and 1 other fields | High correlation |
frequency is highly correlated with gross_revenue and 1 other fields | High correlation |
gross_revenue is highly correlated with frequency and 1 other fields | High correlation |
qtde_returns is highly correlated with avg_basket_size and 1 other fields | High correlation |
avg_basket_size is highly correlated with qtde_returns and 1 other fields | High correlation |
avg_ticket is highly correlated with qtde_returns and 1 other fields | High correlation |
invoice_no is highly correlated with frequency and 1 other fields | High correlation |
avg_ticket is highly skewed (γ1 = 90.51563435) | Skewed |
qtde_returns is highly skewed (γ1 = 51.67714053) | Skewed |
avg_basket_size is highly skewed (γ1 = 49.74912676) | Skewed |
df_index has unique values | Unique |
recency_days has 635 (3.6%) zeros | Zeros |
qtde_returns has 5482 (31.0%) zeros | Zeros |
Reproduction
| Analysis started | 2021-05-28 00:25:16.595433 |
|---|---|
| Analysis finished | 2021-05-28 00:25:31.067358 |
| Duration | 14.47 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
| Distinct | 17703 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10350.0871 |
| Minimum | 0 |
|---|---|
| Maximum | 20522 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 138.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1050.1 |
| Q1 | 5280.5 |
| median | 10402 |
| Q3 | 15428 |
| 95-th percentile | 19539.9 |
| Maximum | 20522 |
| Range | 20522 |
| Interquartile range (IQR) | 10147.5 |
Descriptive statistics
| Standard deviation | 5906.864597 |
|---|---|
| Coefficient of variation (CV) | 0.5707067523 |
| Kurtosis | -1.186694207 |
| Mean | 10350.0871 |
| Median Absolute Deviation (MAD) | 5071 |
| Skewness | -0.01475611063 |
| Sum | 183227592 |
| Variance | 34891049.37 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 13692 | 1 | < 0.1% |
| 7561 | 1 | < 0.1% |
| 5512 | 1 | < 0.1% |
| 19843 | 1 | < 0.1% |
| 17794 | 1 | < 0.1% |
| 9598 | 1 | < 0.1% |
| 15741 | 1 | < 0.1% |
| 3451 | 1 | < 0.1% |
| 15725 | 1 | < 0.1% |
| Other values (17693) | 17693 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 |
| Value | Count | Frequency (%) |
| 20522 | 1 | |
| 20521 | 1 | |
| 20520 | 1 | |
| 20519 | 1 | |
| 20518 | 1 | |
| 20517 | 1 | |
| 20515 | 1 | |
| 20514 | 1 | |
| 20513 | 1 | |
| 20512 | 1 |
invoice_date
Date
| Distinct | 305 |
|---|---|
| Distinct (%) | 1.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 138.4 KiB |
| Minimum | 2016-11-29 00:00:00 |
|---|---|
| Maximum | 2017-12-07 00:00:00 |
Histogram with fixed size bins (bins=50)
customer_id
Real number (ℝ≥0)
| Distinct | 2970 |
|---|---|
| Distinct (%) | 16.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15228.20725 |
| Minimum | 12347 |
|---|---|
| Maximum | 18287 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 138.4 KiB |
Quantile statistics
| Minimum | 12347 |
|---|---|
| 5-th percentile | 12627 |
| Q1 | 13750 |
| median | 15125 |
| Q3 | 16745 |
| 95-th percentile | 17949 |
| Maximum | 18287 |
| Range | 5940 |
| Interquartile range (IQR) | 2995 |
Descriptive statistics
| Standard deviation | 1731.917094 |
|---|---|
| Coefficient of variation (CV) | 0.1137308591 |
| Kurtosis | -1.220108827 |
| Mean | 15228.20725 |
| Median Absolute Deviation (MAD) | 1526 |
| Skewness | 0.07732577092 |
| Sum | 269584953 |
| Variance | 2999536.821 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 14911 | 144 | 0.8% |
| 12748 | 113 | 0.6% |
| 17841 | 113 | 0.6% |
| 15311 | 91 | 0.5% |
| 14606 | 88 | 0.5% |
| 13089 | 83 | 0.5% |
| 12971 | 71 | 0.4% |
| 16422 | 56 | 0.3% |
| 14527 | 55 | 0.3% |
| 13798 | 53 | 0.3% |
| Other values (2960) | 16836 |
| Value | Count | Frequency (%) |
| 12347 | 7 | |
| 12348 | 4 | < 0.1% |
| 12352 | 7 | |
| 12356 | 3 | < 0.1% |
| 12358 | 2 | < 0.1% |
| 12359 | 6 | |
| 12360 | 3 | < 0.1% |
| 12362 | 13 | |
| 12364 | 4 | < 0.1% |
| 12370 | 4 | < 0.1% |
| Value | Count | Frequency (%) |
| 18287 | 3 | < 0.1% |
| 18283 | 14 | |
| 18282 | 3 | < 0.1% |
| 18277 | 2 | < 0.1% |
| 18276 | 2 | < 0.1% |
| 18274 | 2 | < 0.1% |
| 18273 | 3 | < 0.1% |
| 18272 | 7 | |
| 18270 | 3 | < 0.1% |
| 18269 | 2 | < 0.1% |
| Distinct | 2964 |
|---|---|
| Distinct (%) | 16.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9161.586268 |
| Minimum | 6.2 |
|---|---|
| Maximum | 279138.02 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 138.4 KiB |
Quantile statistics
| Minimum | 6.2 |
|---|---|
| 5-th percentile | 360.01 |
| Q1 | 1098.43 |
| median | 2507.07 |
| Q3 | 5899.335 |
| 95-th percentile | 37153.85 |
| Maximum | 279138.02 |
| Range | 279131.82 |
| Interquartile range (IQR) | 4800.905 |
Descriptive statistics
| Standard deviation | 25772.32458 |
|---|---|
| Coefficient of variation (CV) | 2.8130854 |
| Kurtosis | 53.31618939 |
| Mean | 9161.586268 |
| Median Absolute Deviation (MAD) | 1736.97 |
| Skewness | 6.604744639 |
| Sum | 162187561.7 |
| Variance | 664212714 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 140450.72 | 144 | 0.8% |
| 40967.72 | 113 | 0.6% |
| 32317.32 | 113 | 0.6% |
| 60767.9 | 91 | 0.5% |
| 12021.65 | 88 | 0.5% |
| 58825.83 | 83 | 0.5% |
| 11189.91 | 71 | 0.4% |
| 34684.4 | 56 | 0.3% |
| 8507.82 | 55 | 0.3% |
| 37153.85 | 53 | 0.3% |
| Other values (2954) | 16836 |
| Value | Count | Frequency (%) |
| 6.2 | 2 | |
| 13.3 | 2 | |
| 15 | 2 | |
| 36.56 | 4 | |
| 45 | 2 | |
| 52 | 2 | |
| 52.2 | 2 | |
| 52.2 | 2 | |
| 62.43 | 2 | |
| 68.84 | 2 |
| Value | Count | Frequency (%) |
| 279138.02 | 46 | 0.3% |
| 259657.3 | 26 | 0.1% |
| 194550.79 | 29 | 0.2% |
| 168472.5 | 2 | < 0.1% |
| 140450.72 | 144 | |
| 124564.53 | 16 | 0.1% |
| 117379.63 | 51 | 0.3% |
| 91062.38 | 33 | 0.2% |
| 72882.09 | 38 | 0.2% |
| 66653.56 | 17 | 0.1% |
| Distinct | 272 |
|---|---|
| Distinct (%) | 1.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 38.3940575 |
| Minimum | 0 |
|---|---|
| Maximum | 373 |
| Zeros | 635 |
| Zeros (%) | 3.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 138.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 4 |
| median | 16 |
| Q3 | 43 |
| 95-th percentile | 175 |
| Maximum | 373 |
| Range | 373 |
| Interquartile range (IQR) | 39 |
Descriptive statistics
| Standard deviation | 59.21515027 |
|---|---|
| Coefficient of variation (CV) | 1.54229988 |
| Kurtosis | 8.092698024 |
| Mean | 38.3940575 |
| Median Absolute Deviation (MAD) | 14 |
| Skewness | 2.710942182 |
| Sum | 679690 |
| Variance | 3506.434022 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 1551 | 8.8% |
| 2 | 1091 | 6.2% |
| 3 | 908 | 5.1% |
| 4 | 771 | 4.4% |
| 8 | 681 | 3.8% |
| 0 | 635 | 3.6% |
| 10 | 529 | 3.0% |
| 9 | 528 | 3.0% |
| 7 | 501 | 2.8% |
| 17 | 434 | 2.5% |
| Other values (262) | 10074 |
| Value | Count | Frequency (%) |
| 0 | 635 | |
| 1 | 1551 | |
| 2 | 1091 | |
| 3 | 908 | |
| 4 | 771 | |
| 5 | 296 | 1.7% |
| 7 | 501 | 2.8% |
| 8 | 681 | |
| 9 | 528 | 3.0% |
| 10 | 529 | 3.0% |
| Value | Count | Frequency (%) |
| 373 | 4 | < 0.1% |
| 372 | 9 | |
| 371 | 2 | < 0.1% |
| 368 | 2 | < 0.1% |
| 366 | 9 | |
| 365 | 4 | < 0.1% |
| 364 | 2 | < 0.1% |
| 360 | 3 | < 0.1% |
| 359 | 2 | < 0.1% |
| 358 | 10 |
| Distinct | 56 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 16.28870813 |
| Minimum | 1 |
|---|---|
| Maximum | 206 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 138.4 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 4 |
| median | 7 |
| Q3 | 15 |
| 95-th percentile | 57 |
| Maximum | 206 |
| Range | 205 |
| Interquartile range (IQR) | 11 |
Descriptive statistics
| Standard deviation | 28.96384421 |
|---|---|
| Coefficient of variation (CV) | 1.778154779 |
| Kurtosis | 23.88118339 |
| Mean | 16.28870813 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 4.552737187 |
| Sum | 288359 |
| Variance | 838.9042714 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2 | 1800 | 10.2% |
| 4 | 1719 | 9.7% |
| 3 | 1635 | 9.2% |
| 5 | 1271 | 7.2% |
| 6 | 1118 | 6.3% |
| 7 | 1002 | 5.7% |
| 8 | 833 | 4.7% |
| 9 | 676 | 3.8% |
| 11 | 615 | 3.5% |
| 10 | 571 | 3.2% |
| Other values (46) | 6463 |
| Value | Count | Frequency (%) |
| 1 | 396 | 2.2% |
| 2 | 1800 | |
| 3 | 1635 | |
| 4 | 1719 | |
| 5 | 1271 | |
| 6 | 1118 | |
| 7 | 1002 | |
| 8 | 833 | |
| 9 | 676 | 3.8% |
| 10 | 571 | 3.2% |
| Value | Count | Frequency (%) |
| 206 | 113 | |
| 199 | 144 | |
| 124 | 113 | |
| 97 | 83 | |
| 91 | 179 | |
| 86 | 71 | 0.4% |
| 72 | 46 | 0.3% |
| 62 | 90 | |
| 60 | 26 | 0.1% |
| 57 | 53 | 0.3% |
| Distinct | 2970 |
|---|---|
| Distinct (%) | 16.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 44.02014368 |
| Minimum | 2.150588235 |
|---|---|
| Maximum | 56157.5 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 138.4 KiB |
Quantile statistics
| Minimum | 2.150588235 |
|---|---|
| 5-th percentile | 5.226807859 |
| Q1 | 13.99977723 |
| median | 19.12316667 |
| Q3 | 28.9025 |
| 95-th percentile | 114.5063732 |
| Maximum | 56157.5 |
| Range | 56155.34941 |
| Interquartile range (IQR) | 14.90272277 |
Descriptive statistics
| Standard deviation | 604.3794902 |
|---|---|
| Coefficient of variation (CV) | 13.72961194 |
| Kurtosis | 8395.639277 |
| Mean | 44.02014368 |
| Median Absolute Deviation (MAD) | 7.170833333 |
| Skewness | 90.51563435 |
| Sum | 779288.6036 |
| Variance | 365274.5682 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 24.75775075 | 144 | 0.8% |
| 7.056183406 | 113 | 0.6% |
| 5.226807859 | 113 | 0.6% |
| 25.54346364 | 91 | 0.5% |
| 4.455763529 | 88 | 0.5% |
| 32.35744224 | 83 | 0.5% |
| 36.68822951 | 71 | 0.4% |
| 93.99566396 | 56 | 0.3% |
| 8.761915551 | 55 | 0.3% |
| 106.4580229 | 53 | 0.3% |
| Other values (2960) | 16836 |
| Value | Count | Frequency (%) |
| 2.150588235 | 4 | < 0.1% |
| 2.4325 | 2 | < 0.1% |
| 2.462371134 | 2 | < 0.1% |
| 2.511241379 | 3 | < 0.1% |
| 2.515333333 | 2 | < 0.1% |
| 2.65 | 2 | < 0.1% |
| 2.656931818 | 2 | < 0.1% |
| 2.707598253 | 3 | < 0.1% |
| 2.760621572 | 8 | |
| 2.770464191 | 14 |
| Value | Count | Frequency (%) |
| 56157.5 | 2 | < 0.1% |
| 4453.43 | 2 | < 0.1% |
| 3202.92 | 3 | < 0.1% |
| 1687.2 | 3 | < 0.1% |
| 952.9875 | 3 | < 0.1% |
| 872.13 | 3 | < 0.1% |
| 841.0214493 | 29 | |
| 651.1683333 | 4 | < 0.1% |
| 640 | 4 | < 0.1% |
| 624.4 | 2 | < 0.1% |
| Distinct | 1257 |
|---|---|
| Distinct (%) | 7.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 44.18228747 |
| Minimum | 1 |
|---|---|
| Maximum | 366 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 138.4 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 5.271428571 |
| Q1 | 17.33333333 |
| median | 31 |
| Q3 | 54.6 |
| 95-th percentile | 123 |
| Maximum | 366 |
| Range | 365 |
| Interquartile range (IQR) | 37.26666667 |
Descriptive statistics
| Standard deviation | 45.24071433 |
|---|---|
| Coefficient of variation (CV) | 1.023955909 |
| Kurtosis | 12.63302866 |
| Mean | 44.18228747 |
| Median Absolute Deviation (MAD) | 16.56 |
| Skewness | 3.011411808 |
| Sum | 782159.0351 |
| Variance | 2046.722233 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2.601398601 | 144 | 0.8% |
| 3.330357143 | 113 | 0.6% |
| 3.321428571 | 113 | 0.6% |
| 35 | 95 | 0.5% |
| 4.144444444 | 91 | 0.5% |
| 4.275862069 | 88 | 0.5% |
| 4.475609756 | 83 | 0.5% |
| 70 | 81 | 0.5% |
| 14.72 | 78 | 0.4% |
| 22.375 | 77 | 0.4% |
| Other values (1247) | 16740 |
| Value | Count | Frequency (%) |
| 1 | 32 | 0.2% |
| 1.5 | 3 | < 0.1% |
| 2 | 26 | 0.1% |
| 2.5 | 3 | < 0.1% |
| 2.601398601 | 144 | |
| 3 | 31 | 0.2% |
| 3.321428571 | 113 | |
| 3.330357143 | 113 | |
| 3.5 | 6 | < 0.1% |
| 4 | 50 | 0.3% |
| Value | Count | Frequency (%) |
| 366 | 2 | < 0.1% |
| 365 | 2 | < 0.1% |
| 363 | 2 | < 0.1% |
| 362 | 2 | < 0.1% |
| 357 | 4 | |
| 356 | 2 | < 0.1% |
| 355 | 4 | |
| 352 | 2 | < 0.1% |
| 351 | 4 | |
| 350 | 6 |
| Distinct | 1350 |
|---|---|
| Distinct (%) | 7.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.07552646186 |
| Minimum | 0.005449591281 |
|---|---|
| Maximum | 3 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 138.4 KiB |
Quantile statistics
| Minimum | 0.005449591281 |
|---|---|
| 5-th percentile | 0.01290322581 |
| Q1 | 0.02508960573 |
| median | 0.04142011834 |
| Q3 | 0.07608695652 |
| 95-th percentile | 0.25 |
| Maximum | 3 |
| Range | 2.994550409 |
| Interquartile range (IQR) | 0.05099735079 |
Descriptive statistics
| Standard deviation | 0.117333527 |
|---|---|
| Coefficient of variation (CV) | 1.553541952 |
| Kurtosis | 76.64809037 |
| Mean | 0.07552646186 |
| Median Absolute Deviation (MAD) | 0.02071005917 |
| Skewness | 6.254189121 |
| Sum | 1337.044954 |
| Variance | 0.01376715655 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.6514745308 | 144 | 0.8% |
| 0.5802139037 | 113 | 0.6% |
| 0.4530831099 | 113 | 0.6% |
| 0.3155080214 | 91 | 0.5% |
| 0.3378016086 | 88 | 0.5% |
| 0.3206521739 | 83 | 0.5% |
| 0.2378378378 | 71 | 0.4% |
| 0.02777777778 | 71 | 0.4% |
| 0.03571428571 | 69 | 0.4% |
| 0.06666666667 | 66 | 0.4% |
| Other values (1340) | 16794 |
| Value | Count | Frequency (%) |
| 0.005449591281 | 2 | < 0.1% |
| 0.005464480874 | 2 | < 0.1% |
| 0.005494505495 | 2 | < 0.1% |
| 0.005509641873 | 2 | < 0.1% |
| 0.005586592179 | 4 | |
| 0.005602240896 | 2 | < 0.1% |
| 0.005617977528 | 4 | |
| 0.00566572238 | 2 | < 0.1% |
| 0.005681818182 | 4 | |
| 0.005698005698 | 6 |
| Value | Count | Frequency (%) |
| 3 | 2 | < 0.1% |
| 2 | 2 | < 0.1% |
| 1.571428571 | 3 | < 0.1% |
| 1.5 | 6 | < 0.1% |
| 1 | 28 | 0.2% |
| 0.8333333333 | 3 | < 0.1% |
| 0.75 | 3 | < 0.1% |
| 0.6666666667 | 24 | 0.1% |
| 0.6514745308 | 144 | |
| 0.6 | 2 | < 0.1% |
| Distinct | 215 |
|---|---|
| Distinct (%) | 1.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 142.6579676 |
| Minimum | 0 |
|---|---|
| Maximum | 80995 |
| Zeros | 5482 |
| Zeros (%) | 31.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 138.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 6 |
| Q3 | 36 |
| 95-th percentile | 446 |
| Maximum | 80995 |
| Range | 80995 |
| Interquartile range (IQR) | 36 |
Descriptive statistics
| Standard deviation | 1062.494385 |
|---|---|
| Coefficient of variation (CV) | 7.447844678 |
| Kurtosis | 3799.981426 |
| Mean | 142.6579676 |
| Median Absolute Deviation (MAD) | 6 |
| Skewness | 51.67714053 |
| Sum | 2525474 |
| Variance | 1128894.317 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 5482 | |
| 1 | 774 | 4.4% |
| 2 | 727 | 4.1% |
| 4 | 581 | 3.3% |
| 3 | 537 | 3.0% |
| 5 | 456 | 2.6% |
| 6 | 437 | 2.5% |
| 12 | 386 | 2.2% |
| 8 | 359 | 2.0% |
| 18 | 311 | 1.8% |
| Other values (205) | 7653 |
| Value | Count | Frequency (%) |
| 0 | 5482 | |
| 1 | 774 | 4.4% |
| 2 | 727 | 4.1% |
| 3 | 537 | 3.0% |
| 4 | 581 | 3.3% |
| 5 | 456 | 2.6% |
| 6 | 437 | 2.5% |
| 7 | 261 | 1.5% |
| 8 | 359 | 2.0% |
| 9 | 303 | 1.7% |
| Value | Count | Frequency (%) |
| 80995 | 2 | < 0.1% |
| 9360 | 16 | 0.1% |
| 9014 | 2 | < 0.1% |
| 8004 | 38 | 0.2% |
| 4427 | 14 | 0.1% |
| 3768 | 6 | < 0.1% |
| 3332 | 144 | |
| 2878 | 29 | 0.2% |
| 2022 | 9 | 0.1% |
| 2012 | 24 | 0.1% |
| Distinct | 1980 |
|---|---|
| Distinct (%) | 11.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 270.1811978 |
| Minimum | 1 |
|---|---|
| Maximum | 40498.5 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 138.4 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 52.5 |
| Q1 | 116.5833333 |
| median | 188.5 |
| Q3 | 299.5 |
| 95-th percentile | 660.8627451 |
| Maximum | 40498.5 |
| Range | 40497.5 |
| Interquartile range (IQR) | 182.9166667 |
Descriptive statistics
| Standard deviation | 533.1357215 |
|---|---|
| Coefficient of variation (CV) | 1.973252491 |
| Kurtosis | 3667.449042 |
| Mean | 270.1811978 |
| Median Absolute Deviation (MAD) | 84.5 |
| Skewness | 49.74912676 |
| Sum | 4783017.745 |
| Variance | 284233.6976 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 403.3316583 | 144 | 0.8% |
| 123.8398058 | 113 | 0.6% |
| 185.9112903 | 113 | 0.6% |
| 419.7142857 | 91 | 0.5% |
| 68.2967033 | 88 | 0.5% |
| 320.3092784 | 83 | 0.5% |
| 108.0116279 | 71 | 0.4% |
| 660.8627451 | 56 | 0.3% |
| 37.94545455 | 55 | 0.3% |
| 420.1403509 | 53 | 0.3% |
| Other values (1970) | 16836 |
| Value | Count | Frequency (%) |
| 1 | 4 | |
| 2 | 2 | < 0.1% |
| 3.333333333 | 6 | |
| 5.333333333 | 4 | |
| 5.666666667 | 3 | |
| 6.142857143 | 5 | |
| 7.5 | 4 | |
| 9 | 2 | < 0.1% |
| 9.5 | 2 | < 0.1% |
| 11 | 3 |
| Value | Count | Frequency (%) |
| 40498.5 | 2 | < 0.1% |
| 6009.333333 | 2 | < 0.1% |
| 4282 | 2 | < 0.1% |
| 3906 | 3 | < 0.1% |
| 3868.65 | 16 | 0.1% |
| 2880 | 6 | < 0.1% |
| 2801 | 2 | < 0.1% |
| 2733.944444 | 46 | |
| 2518.769231 | 13 | 0.1% |
| 2160.333333 | 3 | < 0.1% |
| Distinct | 1005 |
|---|---|
| Distinct (%) | 5.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 21.50874042 |
| Minimum | 1 |
|---|---|
| Maximum | 299.7058824 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 138.4 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 3.75862069 |
| Q1 | 10.24242424 |
| median | 17.25 |
| Q3 | 26.6 |
| 95-th percentile | 53.75 |
| Maximum | 299.7058824 |
| Range | 298.7058824 |
| Interquartile range (IQR) | 16.35757576 |
Descriptive statistics
| Standard deviation | 19.06981243 |
|---|---|
| Coefficient of variation (CV) | 0.8866075864 |
| Kurtosis | 51.04282444 |
| Mean | 21.50874042 |
| Median Absolute Deviation (MAD) | 7.75 |
| Skewness | 4.806136608 |
| Sum | 380769.2316 |
| Variance | 363.657746 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 13 | 200 | 1.1% |
| 11 | 146 | 0.8% |
| 28.50753769 | 144 | 0.8% |
| 16 | 138 | 0.8% |
| 14 | 114 | 0.6% |
| 63.20967742 | 113 | 0.6% |
| 22.23300971 | 113 | 0.6% |
| 9 | 112 | 0.6% |
| 18 | 107 | 0.6% |
| 17 | 102 | 0.6% |
| Other values (995) | 16414 |
| Value | Count | Frequency (%) |
| 1 | 97 | |
| 1.2 | 5 | < 0.1% |
| 1.25 | 5 | < 0.1% |
| 1.333333333 | 7 | < 0.1% |
| 1.5 | 22 | 0.1% |
| 1.568181818 | 29 | 0.2% |
| 1.571428571 | 4 | < 0.1% |
| 1.666666667 | 16 | 0.1% |
| 1.833333333 | 5 | < 0.1% |
| 2 | 83 |
| Value | Count | Frequency (%) |
| 299.7058824 | 17 | |
| 259 | 2 | < 0.1% |
| 203.5 | 4 | < 0.1% |
| 148 | 2 | < 0.1% |
| 145 | 3 | < 0.1% |
| 136.125 | 10 | |
| 135.5 | 4 | < 0.1% |
| 127 | 2 | < 0.1% |
| 122 | 4 | < 0.1% |
| 118 | 3 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | invoice_date | customer_id | gross_revenue | recency_days | invoice_no | avg_ticket | avg_recency_days | frequency | qtde_returns | avg_basket_size | avg_unique_basket_size | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 2016-11-29 | 17850 | 5391.21 | 372.0 | 34.0 | 18.152222 | 35.500000 | 0.486111 | 40.0 | 50.970588 | 8.735294 |
| 1 | 1 | 2016-11-29 | 13047 | 3232.59 | 56.0 | 9.0 | 18.904035 | 27.250000 | 0.048780 | 35.0 | 154.444444 | 19.000000 |
| 2 | 2 | 2016-11-29 | 12583 | 6705.38 | 2.0 | 15.0 | 28.902500 | 23.187500 | 0.045699 | 50.0 | 335.200000 | 15.466667 |
| 3 | 3 | 2016-11-29 | 13748 | 948.25 | 95.0 | 5.0 | 33.866071 | 92.666667 | 0.017921 | 0.0 | 87.800000 | 5.600000 |
| 4 | 4 | 2016-11-29 | 15100 | 876.00 | 333.0 | 3.0 | 292.000000 | 8.600000 | 0.136364 | 22.0 | 26.666667 | 1.000000 |
| 5 | 5 | 2016-11-29 | 15291 | 4623.30 | 25.0 | 14.0 | 45.326471 | 23.200000 | 0.054441 | 29.0 | 150.142857 | 7.285714 |
| 6 | 6 | 2016-11-29 | 14688 | 5630.87 | 7.0 | 21.0 | 17.219786 | 18.300000 | 0.073569 | 399.0 | 172.428571 | 15.571429 |
| 7 | 7 | 2016-11-29 | 17809 | 5411.91 | 16.0 | 12.0 | 88.719836 | 35.700000 | 0.039106 | 41.0 | 171.416667 | 5.083333 |
| 8 | 8 | 2016-11-29 | 15311 | 60767.90 | 0.0 | 91.0 | 25.543464 | 4.144444 | 0.315508 | 474.0 | 419.714286 | 26.142857 |
| 9 | 9 | 2016-11-29 | 16098 | 2005.63 | 87.0 | 7.0 | 29.934776 | 47.666667 | 0.024390 | 0.0 | 87.571429 | 9.571429 |
Last rows
| df_index | invoice_date | customer_id | gross_revenue | recency_days | invoice_no | avg_ticket | avg_recency_days | frequency | qtde_returns | avg_basket_size | avg_unique_basket_size | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 17693 | 20512 | 2017-12-07 | 17315 | 6252.18 | 1.0 | 36.0 | 13.302511 | 10.200000 | 0.114525 | 32.0 | 107.277778 | 13.055556 |
| 17694 | 20513 | 2017-12-07 | 12662 | 3543.78 | 0.0 | 11.0 | 16.108091 | 37.300000 | 0.032086 | 18.0 | 185.545455 | 20.000000 |
| 17695 | 20514 | 2017-12-07 | 16705 | 14034.99 | 0.0 | 20.0 | 51.981444 | 17.900000 | 0.080780 | 18.0 | 273.800000 | 13.500000 |
| 17696 | 20515 | 2017-12-07 | 12526 | 1172.66 | 0.0 | 3.0 | 17.245000 | 47.000000 | 0.031579 | 0.0 | 208.000000 | 22.666667 |
| 17697 | 20517 | 2017-12-07 | 17581 | 11045.04 | 0.0 | 25.0 | 25.102364 | 24.800000 | 0.083110 | 66.0 | 237.080000 | 17.600000 |
| 17698 | 20518 | 2017-12-07 | 12748 | 32317.32 | 0.0 | 206.0 | 7.056183 | 3.330357 | 0.580214 | 1535.0 | 123.839806 | 22.233010 |
| 17699 | 20519 | 2017-12-07 | 13777 | 25977.16 | 0.0 | 33.0 | 131.863756 | 12.433333 | 0.106952 | 90.0 | 390.818182 | 5.969697 |
| 17700 | 20520 | 2017-12-07 | 15804 | 4206.39 | 0.0 | 13.0 | 16.054924 | 11.647059 | 0.095477 | 52.0 | 197.307692 | 20.153846 |
| 17701 | 20521 | 2017-12-07 | 13113 | 12245.96 | 0.0 | 24.0 | 61.229800 | 15.913043 | 0.106267 | 449.0 | 126.875000 | 8.333333 |
| 17702 | 20522 | 2017-12-07 | 12680 | 790.81 | 0.0 | 4.0 | 16.138980 | 37.666667 | 0.035088 | 0.0 | 109.750000 | 12.250000 |